Adaptive frame size support in advanced video codecs

ABSTRACT

Techniques are described related to receiving first and second sub-sequences of video, wherein the first sub-sequence includes one or more frames each having a first resolution, and the second sub-sequence includes one or more frames each having a second resolution, receiving a first sequence parameter set and a second sequence parameter set for the coded video sequence, wherein the first sequence parameter set indicates the first resolution of the one or more frames of the first sub-sequence, and the second sequence parameter set indicates the second resolution of the one or more frames of the second sub-sequence, and wherein the first sequence parameter set is different than the second sequence parameter set, and using the first sequence parameter set and the second sequence parameter set to decode the coded video sequence.

This application claims the benefit of:

U.S. Provisional Application No. 61/545,525, filed Oct. 10, 2011, and

U.S. Provisional Application No. 61/550,276, filed on Oct. 21, 2011 theentire contents each of which is hereby incorporated by reference in itsentirety.

TECHNICAL FIELD

This disclosure relates to video coding and, more particularly, totechniques for coding video data.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, tablet computers, e-book readers, digitalcameras, digital recording devices, digital media players, video gamingdevices, video game consoles, cellular or satellite radio telephones,so-called “smart phones,” video teleconferencing devices, videostreaming devices, and the like. Digital video devices implement videocompression techniques, such as those described in the standards definedby MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, AdvancedVideo Coding (AVC), the High Efficiency Video Coding (HEVC) standardpresently under development, and extensions of such standards. The videodevices may transmit, receive, encode, decode, and/or store digitalvideo information more efficiently by implementing such videocompression techniques.

Video compression techniques perform spatial (intra-picture) predictionand/or temporal (inter-picture) prediction to reduce or removeredundancy inherent in video sequences. For block-based video coding, avideo slice (i.e., a video picture or a portion of a video picture) maybe partitioned into video blocks, which may also be referred to astreeblocks, coding tree blocks (CTBs), coding tree units (CTUs), codingunits (CUs) and/or coding nodes. Video blocks in an intra-coded (I)slice of a picture are encoded using spatial prediction with respect toreference samples in neighboring blocks in the same picture. Videoblocks in an inter-coded (P or B) slice of a picture may use spatialprediction with respect to reference samples in neighboring blocks inthe same picture or temporal prediction with respect to referencesamples in other reference pictures. Pictures may be referred to asframes, and reference pictures may be referred to a reference frames.

Spatial or temporal prediction results in a predictive block for a blockto be coded. Residual data represents pixel differences between theoriginal block to be coded and the predictive block. An inter-codedblock is encoded according to a motion vector that points to a block ofreference samples forming the predictive block, and the residual dataindicating the difference between the coded block and the predictiveblock. An intra-coded block is encoded according to an intra-coding modeand the residual data. For further compression, the residual data may betransformed from the pixel domain to a transform domain, resulting inresidual transform coefficients, which then may be quantized. Thequantized transform coefficients, initially arranged in atwo-dimensional array, may be scanned in order to produce aone-dimensional vector of transform coefficients, and entropy coding maybe applied to achieve even more compression.

SUMMARY

In general, this disclosure describes techniques for coding videosequences that include frames, or “pictures,” having different spatialresolutions. One aspect of this disclosure includes using multiplesequence parameter sets in a single resolution-adaptive coded videosequence to indicate a resolution of a sequence of pictures in codedvideo. As one example, the resolution-adaptive coded video sequence maycomprise two or more sub-sequences which may be coded, wherein eachsub-sequence may comprise a set of pictures with a common spatialresolution, and may refer to a same active sequence parameter set.Another aspect of this disclosure includes a novel activation processfor activating a sequence parameter set when using multiple sequenceparameter sets in a single resolution-adaptive coded video sequence, asdescribed above.

Yet another aspect of this disclosure includes novel techniques formanaging a decoded picture buffer (DPB). As one example, a size of a DPBis not indicated using a number of frame buffers (e.g., a number ofstorage locations each capable of storing a frame, or “picture,” of afixed size), consistent with some techniques, but rather using adifferent unit of size. As another example, before inserting a decodedpicture into a DPB, the availability of the DPB to store the decodedpicture is determined based on a spatial resolution of the decodedpicture to be inserted, so as to ensure that the DPB includes sufficientempty buffer space for inserting the decoded picture. As still anotherexample, after removing a decoded picture from a DPB, the availabilityof the DPB to store a subsequent decoded picture is determined based ona spatial resolution of the removed decoded picture, and a spatialresolution of the subsequent decoded picture to be inserted into theDPB. In other words, the proportion of the DPB unavailable to storedecoded pictures, or a “fullness” of the DPB, after removing the decodedpicture, is not decreased by an amount corresponding to a single decodedpicture of a fixed size, consistent with some techniques, but rather bya varying amount, depending on the spatial resolution of the removeddecoded picture.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system that may utilize techniques described in thisdisclosure.

FIG. 2 is a block diagram illustrating an example video encoder that mayimplement the techniques described in this disclosure.

FIG. 3 is a block diagram illustrating an example video decoder that mayimplement the techniques described in this disclosure.

FIGS. 4A-4D are conceptual diagrams illustrating an example videosequence that includes a plurality of pictures that are encoded andtransmitted in accordance with the techniques of this disclosure.

FIG. 5 is a conceptual diagram illustrating the operation of a decodedpicture buffer of a hypothetical reference decoder (HRD) model inaccordance with the techniques of this disclosure.

FIG. 6 is a flowchart illustrating an example operation of using a firstsub-sequence and a second sub-sequence to decode video in accordancewith the techniques of this disclosure.

FIG. 7 is a flowchart illustrating an example operation of managing adecoded picture buffer in accordance with the techniques of thisdisclosure.

DETAILED DESCRIPTION

The techniques of this disclosure are generally related for techniquesfor using multiple sequence parameter sets (SPSs) for communicatingvideo data at different resolutions, and techniques for managing themultiple SPSs. In the current High Efficiency Video Coding (HEVC)design, pictures in a same coded video sequence (CVS) have a same size,wherein the size is signaled in a sequence parameter set (SPS) for theCVS. Additional syntax information for the CVS also signaled in the SPSincludes the Largest Coding Unit (LCU) size and the Smallest Coding Unit(SCU) size, which define a largest and a smallest block, or coding unit,size for each picture, respectively. In the context of H.264/AVC andHigh Efficiency Video Coding (HEVC), a CVS may refer to a sequence ofcoded pictures starting from an instantaneous decoding refresh (IDR)picture to another IDR picture, exclusive, in decoding order, or the endof the coded video bitstream if the starting IDR picture is the last IDRpicture in coded video bitstream.

However, HEVC may support resolution-adaptive video sequences thatinclude frames with different resolutions. One method for adaptive framesize support is described in JCTVC-F158: Resolution switching for codingefficiency and resilience, Davies, 6th Meeting, Turin, IT, 14-22 Jul.2011, referred to as JCTVC-F158 hereinafter.

To support resolution-adaptive video, this disclosure describestechniques for coding multiple SPSs. Each SPS of the multiple SPSs mayinclude information related to a sequence of pictures that has adifferent resolution. This disclosure also introduces a new sequence,referred to as a resolution sub-sequence (RSS) that may refer back toone of the multiple SPSs in order to indicate the resolution of asequence of pictures. This disclosure also describes techniques foractivating a single SPS when multiple parameters sets may be utilizedwithin a single CVS, as well as different techniques and orders fortransmitting the different SPSs.

The techniques of this disclosure are also related to techniques formanaging a decoded picture buffer (DPB). For example, a video coder(e.g., a video encoder or a video decoder) includes a DPB. The DPBstores decoded pictures, including reference pictures. Referencepictures are pictures that can potentially be used for inter-predictinga picture. In other words, the video coder may predict a picture, duringcoding (encoding or decoding) of that picture, based on one or morereference pictures stored in the DPB.

Decoded pictures used for predicting subsequent coded pictures, and forfuture output, are buffered in a Decoded Picture Buffer (DPB).

To efficiently utilize memory of a DPB, DPB management processes,including a storage process of decoded pictures into the DPB, a markingprocess of reference pictures, and an output and removal processes ofdecoded pictures from the DPB, are specified. DPB management includes atleast the following aspects: (1) Picture identification and referencepicture identification; (2) Reference picture list construction; (3)Reference picture marking; (4) Picture output from the DPB; (5) Pictureinsertion into the DPB; and (6) Picture removal from the DPB. Someintroduction to reference picture marking and reference picture listconstruction is included below.

Each CVS may include a number of reference pictures, which may be usedto predict pixel values of other pictures (e.g., pictures that comebefore or after the reference picture). A video coder marks eachreference picture, and stores the reference picture in the DPB. Inprevious video coding standards, such as H.264/AVC, the DPB includes amaximum number, referred to as M (num_ref_frames), of reference picturesused for inter-prediction in the active sequence parameter set. When areference picture is decoded, the reference picture is marked as “usedfor reference.” If the decoding of the reference picture caused morethan M pictures to be marked as “used for reference,” at least onepicture must be marked as “unused for reference.” The DPB removalprocess then would remove pictures marked as “unused for reference” fromthe DPB if they are not needed for output as well.

When a picture is decoded, the decoded picture may be either anon-reference picture or a reference picture. A reference picture may bea long-term reference picture or short-term reference picture, and whenthe decoded picture is marked as “unused for reference”, the decodedpicture may become no longer needed for reference. In some video codingstandards, there may be reference picture marking operations that changethe status of the reference pictures.

There may be at least two types of operation modes for the referencepicture marking, such as a sliding window operation mode, and anadaptive memory control operation mode. The operation mode for referencepicture marking may be selected on a picture basis; whereas, the slidingwindow operation mode may work as a first-in-first-out queue with afixed number of short-term reference pictures. In other words,short-term reference pictures with earliest decoding time may be thefirst to be removed (marked as picture not used for reference), in animplicit fashion.

The video coder may also be tasked with constructing reference picturelists that indicate which reference pictures may be used forinter-prediction purposes. Two of these reference picture lists arereferred to as List 0 and List 1, respectively. The video coder firstlyemploys default construction techniques to construct List 0 and List 1(e.g., preconfigured construction schemes for constructing List 0 andList 1). Optionally, after the initial List 0 and List 1 areconstructed, the video decoder may decode syntax elements, when present,that instruct the video decoder to modify the initial List 0 and List 1.

The video encoder may signal syntax elements that are indicative ofidentifier(s) of reference pictures in the DPB, and the video encodermay also signal syntax elements that include indices, within List 0,List 1, or both List 0 and List 1, that indicate which reference pictureor pictures to use to decode a coded block of a current picture. Thevideo decoder, in turn, uses the received identifier to identify theindex value or values for a reference picture or reference pictureslisted in List 0, List 1, or both List 0 and List 1. From the indexvalue(s) as well as the identifier(s) of the reference picture orreference pictures, the video decoder retrieves the reference picture orreference pictures, or part(s) thereof, from the DPB, and decodes thecoded block of the current picture based on the retrieved referencepicture or pictures and one or more motion vectors that identify blockswithin the reference picture or pictures that are used for decoding thecoded block.

In the context of AVC and HEVC, a coded video sequence (CVS) refers to asequence of coded frames, or “pictures,” ranging from an instantaneousdecoding refresh (IDR) picture to another IDR picture, exclusive, in adecoding order, or to an end of a coded video bitstream if the startingIDR picture is the last IDR picture in the coded video bitstream.

However, when coding a single CVS comprising pictures having at leasttwo different spatial resolutions, with respect to some solutions basedon HEVC, e.g., as described in JCTVC-F158, using a DPB having a sizemeasured in pictures may cause a number of issues, which are describedbelow.

First, a sub-sequence of pictures with one resolution may have differentcoding parameters, such as an LCU size, than another sub-sequence ofpictures with another, different resolution. Accordingly, it may not besufficient to use a single active SPS to describe characteristics of aCVS comprising the sub-sequences of pictures with the differentresolutions.

Furthermore, different sub-sequences of a CVS may have referencepictures having different sizes, that is, different spatial resolutions.Accordingly, one set of particular parameters included in an SPS for theCVS, e.g., max_num_ref_frames, may be optimal for one sub-sequence, butcan be sub-optimal for all sub-sequences included in the CVS.

Additionally, some techniques for DPB management may no longer beeffective when coding a single CVS that includes pictures havingdifferent resolutions. As one example, because the pictures having thedifferent resolutions may correspond to the pictures having differentsizes, a size of a DPB used to store the pictures can no longer beindicated using a number of frame buffers, e.g., a number of storagelocations each capable of storing a frame, or “picture,” of a fixedsize.

Furthermore, to insert a decoded picture into the DPB, the DPB mustinclude an empty frame buffer of a size that is sufficiently large tostore the decoded picture. However, once again, because the pictureshaving the different resolutions may correspond to the pictures havingdifferent sizes, a frame buffer of a fixed size may not correspond to asize of a particular decoded picture to be inserted. Accordingly, merelydetermining whether the DPB includes an empty frame buffer of a fixedsize may be insufficient to determine whether the DPB is available tostore the decoded picture. As one example, the DPB may have less bufferspace than is required to store the decoded picture.

Similarly, after removing a decoded picture from the DPB, wherein theremoved decoded picture has a resolution that corresponds to a size thatis different than the size of the frame buffer, merely determining thatthe decoded picture has been removed from the DPB may be insufficient todetermine whether the DPB is actually available to store a subsequentdecoded picture having a particular resolution. Furthermore, the abovedetermination is also insufficient to indicate the actual buffer spacethat may be available within the DPB for storing additional decodedpictures.

In another example, a single empty frame buffer of a fixed size mayexist within the DPB, and the DPB may store decoded picture(s) having aparticular resolution in the frame buffer. However, if a video coderremoves a decoded picture from the DPB, and the removed picture has aresolution that is smaller than the size of the frame buffer, sufficientbuffer space may exist within the DPB to insert a decoded picture with aresolution that corresponds to a size that is larger than the size ofthe removed decoded picture. Accordingly, merely determining that aparticular decoded picture has been removed from the DPB may beinsufficient to indicate the actual buffer space that may be availablewithin the DPB for storing additional decoded pictures.

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system 10 that may utilize techniques described in thisdisclosure. In general, a reference picture set is defined as a set ofreference pictures associated with a picture, consisting of allreference pictures that are prior to the associated picture in decodingorder, that may be used for inter prediction of the associate picture orany picture following the associated picture in decoding order. In someexamples, the reference pictures that are prior to the associatedpicture may be reference pictures until the next instantaneous decodingrefresh (IDR) picture, or broken link access (BLA) picture. In otherwords, reference pictures in the reference picture set may all be priorto the current picture in decoding order. Also, the reference picturesin the reference picture set may be used for inter-predicting thecurrent picture and/or inter-predicting any picture following thecurrent picture in decoding order until the next IDR picture or BLApicture.

For example, some of the reference pictures in the reference picture setare reference pictures that can potentially be used to inter-predict ablock of the current picture, and not pictures following the currentpicture in decoding order. Some of the reference pictures in thereference picture set are reference pictures that can potentially beused to inter-predict a block of the current picture, and blocks in oneor more pictures following the current picture in decoding order. Someof the reference pictures in the reference picture set are referencepictures that can potentially be used to inter-predict blocks in one ormore pictures following the current picture in decoding order, andcannot be used to inter-predict a block in the current picture.

As used in this disclosure, reference pictures that can potentially beused for inter-prediction refer to reference pictures that can be usedfor inter-prediction, but do not necessarily have to be used forinter-prediction. For example, the reference picture set may identifyreference pictures that can potentially be used for inter-prediction.However, this does not mean that all of the identified referencepictures must be used for inter-prediction. Rather, one or more of theseidentified reference pictures could be used for inter-prediction, butall do not necessarily have to be used for inter-prediction.

As shown in FIG. 1, system 10 includes a source device 12 that generatesencoded video for decoding by destination device 14. Source device 12and destination device 14 may each be an example of a video codingdevice. Source device 12 may transmit the encoded video to destinationdevice 14 via communication channel 16 or may store the encoded video ona storage medium 17 or a file server 19, such that the encoded video maybe accessed by the destination device 14 as desired.

Source device 12 and destination device 14 may comprise any of a widerange of devices, including a wireless handset such as so-called “smart”phones, so-called “smart” pads, or other such wireless devices equippedfor wireless communication. Additional examples of source device 12 anddestination device 14 include, but are not limited to, a digitaltelevision, a device in digital direct broadcast system, a device inwireless broadcast system, a personal digital assistants (PDA), a laptopcomputer, a desktop computer, a tablet computer, an e-book reader, adigital camera, a digital recording device, a digital media player, avideo gaming device, a video game console, a cellular radio telephone, asatellite radio telephone, a video teleconferencing device, and a videostreaming device, a wireless communication device, or the like.

As indicated above, in many cases, source device 12 and/or destinationdevice 14 may be equipped for wireless communication. Hence,communication channel 16 may comprise a wireless channel, a wiredchannel, or a combination of wireless and wired channels suitable fortransmission of encoded video data. Similarly, the file server 19 may beaccessed by the destination device 14 through any standard dataconnection, including an Internet connection. This may include awireless channel (e.g., a Wi-Fi connection), a wired connection (e.g.,DSL, cable modem, etc.), or a combination of both that is suitable foraccessing encoded video data stored on a file server.

The techniques of this disclosure, however, may be applied to videocoding in support of any of a variety of multimedia applications, suchas over-the-air television broadcasts, cable television transmissions,satellite television transmissions, streaming video transmissions, e.g.,via the Internet, encoding of digital video for storage on a datastorage medium, decoding of digital video stored on a data storagemedium, or other applications. In some examples, system 10 may beconfigured to support one-way or two-way video transmission to supportapplications such as video streaming, video playback, videobroadcasting, and/or video telephony

In the example of FIG. 1, source device 12 includes a video source 18,video encoder 20, a modulator/demodulator (MODEM) 22 and an outputinterface 24. In source device 12, video source 18 may include a sourcesuch as a video capture device, such as a video camera, a video archivecontaining previously captured video, a video feed interface to receivevideo from a video content provider, and/or a computer graphics systemfor generating computer graphics data as the source video, or acombination of such sources. As one example, if video source 18 is avideo camera, source device 12 and destination device 14 may formso-called camera phones or video phones. However, the techniquesdescribed in this disclosure may be applicable to video coding ingeneral, and may be applied to wireless and/or wired applications.

The captured, pre-captured, or computer-generated video may be encodedby video encoder 20. The encoded video information may be modulated bymodem 22 according to a communication standard, such as a wirelesscommunication protocol, and transmitted to destination device 14 viaoutput interface 24. Modem 22 may include various mixers, filters,amplifiers or other components designed for signal modulation. Outputinterface 24 may include circuits designed for transmitting data,including amplifiers, filters, and one or more antennas.

The captured, pre-captured, or computer-generated video that is encodedby the video encoder 20 may also be stored onto a storage medium 17 or afile server 19 for later consumption. The storage medium 17 may includeBlu-ray discs, DVDs, CD-ROMs, flash memory, or any other suitabledigital storage media for storing encoded video. The encoded videostored on the storage medium 17 may then be accessed by destinationdevice 14 for decoding and playback.

File server 19 may be any type of server capable of storing encodedvideo and transmitting that encoded video to the destination device 14.Example file servers include a web server (e.g., for a website), an FTPserver, network attached storage (NAS) devices, a local disk drive, orany other type of device capable of storing encoded video data andtransmitting it to a destination device. The transmission of encodedvideo data from the file server 19 may be a streaming transmission, adownload transmission, or a combination of both. The file server 19 maybe accessed by the destination device 14 through any standard dataconnection, including an Internet connection. This may include awireless channel (e.g., a Wi-Fi connection), a wired connection (e.g.,DSL, cable modem, Ethernet, USB, etc.), or a combination of both that issuitable for accessing encoded video data stored on a file server.

Destination device 14, in the example of FIG. 1, includes an inputinterface 26, a modem 28, a video decoder 30, and a display device 32.Input interface 26 of destination device 14 receives information overchannel 16, as one example, or from storage medium 17 or file server 17,as alternate examples, and modem 28 demodulates the information toproduce a demodulated bitstream for video decoder 30. The demodulatedbitstream may include a variety of syntax information generated by videoencoder 20 for use by video decoder 30 in decoding video data. Suchsyntax may also be included with the encoded video data stored on astorage medium 17 or a file server 19. As one example, the syntax may beembedded with the encoded video data, although aspects of thisdisclosure should not be considered limited to such a requirement. Thesyntax information defined by video encoder 20, which is also used byvideo decoder 30, may include syntax elements that describecharacteristics and/or processing of video blocks, such as coding treeunits (CTUs), coding tree blocks (CTBs), prediction units (PUs), codingunits (CUs) or other units of coded video, e.g., video slices, videopictures, and video sequences or groups of pictures (GOPs). Each ofvideo encoder 20 and video decoder 30 may form part of a respectiveencoder-decoder (CODEC) that is capable of encoding or decoding videodata.

Display device 32 may be integrated with, or external to, destinationdevice 14. In some examples, destination device 14 may include anintegrated display device and also be configured to interface with anexternal display device. In other examples, destination device 14 may bea display device. In general, display device 32 displays the decodedvideo data to a user, and may comprise any of a variety of displaydevices such as a liquid crystal display (LCD), a plasma display, anorganic light emitting diode (OLED) display, or another type of displaydevice.

In the example of FIG. 1, communication channel 16 may comprise anywireless or wired communication medium, such as a radio frequency (RF)spectrum or one or more physical transmission lines, or any combinationof wireless and wired media. Communication channel 16 may form part of apacket-based network, such as a local area network, a wide-area network,or a global network such as the Internet. Communication channel 16generally represents any suitable communication medium, or collection ofdifferent communication media, for transmitting video data from sourcedevice 12 to destination device 14, including any suitable combinationof wired or wireless media. Communication channel 16 may includerouters, switches, base stations, or any other equipment that may beuseful to facilitate communication from source device 12 to destinationdevice 14.

Video encoder 20 and video decoder 30 may operate according to a videocompression standard, such as the include ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IECMPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC),including its Scalable Video Coding (SVC) and Multiview Video Coding(MVC) extensions. In addition, there is a new video coding standard,namely High Efficiency Video Coding (HEVC) standard presently underdevelopment by the Joint Collaboration Team on Video Coding (JCT-VC) ofITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion PictureExperts Group (MPEG). A recent Working Draft (WD) of HEVC, and referredto as HEVC WD8 hereinafter, is available, as of Jul. 20, 2012, fromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/10_Stockholm/wg11/JCTVC-J1003-v8.zip.

The techniques of this disclosure, however, are not limited to anyparticular coding standard. For purposes of illustration only, thetechniques are described in accordance with the HEVC standard.

Although not shown in FIG. 1, in some aspects, video encoder 20 andvideo decoder 30 may each be integrated with an audio encoder anddecoder, and may include appropriate MUX-DEMUX units, or other hardwareand software, to handle encoding of both audio and video in a commondata stream or separate data streams. If applicable, MUX-DEMUX units mayconform to the ITU H.223 multiplexer protocol, or other protocols suchas the user datagram protocol (UDP).

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder circuitry, such as one or more processorsincluding microprocessors, digital signal processors (DSPs), applicationspecific integrated circuits (ASICs), field programmable gate arrays(FPGAs), discrete logic, software, hardware, firmware or anycombinations thereof. When the techniques are implemented partially insoftware, a device may store instructions for the software in asuitable, non-transitory computer-readable medium and execute theinstructions in hardware using one or more processors to perform thetechniques of this disclosure.

Each of video encoder 20 and video decoder 30 may be included in one ormore encoders or decoders, either of which may be integrated as part ofa combined encoder/decoder (CODEC) in a respective device. In someinstances, video encoder 20 and video decoder 30 may be commonlyreferred to as a video coder that codes information (e.g., pictures andsyntax elements). The coding of information may refer to encoding whenthe video coder corresponds to video encoder 20. The coding ofinformation may refer to decoding when the video coder corresponds tovideo decoder 30.

FIG. 2 is a block diagram illustrating an example video encoder 20 thatmay implement the techniques described in this disclosure. Video encoder20 may perform intra- and inter-coding of video blocks within videoslices. Intra coding relies on spatial prediction to reduce or removespatial redundancy in video within a given video frame or picture.Inter-coding relies on temporal prediction to reduce or remove temporalredundancy in video within adjacent frames or pictures of a videosequence. Intra-mode (I mode) may refer to any of several spatial basedcompression modes. Inter-modes, such as uni-directional prediction (Pmode) or bi-prediction (B mode), may refer to any of severaltemporal-based compression modes.

In the example of FIG. 2, video encoder 20 includes a partitioning unit35, prediction processing unit 41, summer 50, transform processing unit52, quantization unit 54, entropy encoding unit 56, decoded picturebuffer (DPB) 64, and DPB management unit 65. Prediction processing unit41 includes motion estimation unit 42, motion compensation unit 44, andintra prediction unit 46. For video block reconstruction, video encoder20 also includes inverse quantization unit 58, inverse transform unit60, and summer 62. A deblocking filter (not shown in FIG. 2) may also beincluded to filter block boundaries to remove blockiness artifacts fromreconstructed video. If desired, the deblocking filter would typicallyfilter the output of summer 62. Additional loop filters (in loop or postloop) may also be used in addition to the deblocking filter.

As shown in FIG. 2, video encoder 20 receives video data, andpartitioning unit 35 partitions the data into video blocks. Thispartitioning may also include partitioning into slices, tiles, or otherlarger units, as wells as video block partitioning, e.g., according to aquadtree structure of LCUs and CUs. Video encoder 20 generallyillustrates the components that encode video blocks within a video sliceto be encoded. The slice may be divided into multiple video blocks (andpossibly into sets of video blocks referred to as tiles). Predictionprocessing unit 41 may select one of a plurality of possible codingmodes, such as one of a plurality of intra coding modes or one of aplurality of inter coding modes, for the current video block based onerror results (e.g., coding rate and the level of distortion).Prediction processing unit 41 may provide the resulting intra- orinter-coded block to summer 50 to generate residual block data and tosummer 62 to reconstruct the encoded block for use as a referencepicture.

Intra prediction unit 46 within prediction processing unit 41 mayperform intra-predictive coding of the current video block relative toone or more neighboring blocks in the same picture or slice as thecurrent block to be coded to provide spatial compression. Motionestimation unit 42 and motion compensation unit 44 within predictionprocessing unit 41 perform inter-predictive coding of the current videoblock relative to one or more predictive blocks in one or more referencepictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine theinter-prediction mode for a video slice according to a predeterminedpattern for a video sequence. The predetermined pattern may designatevideo slices in the sequence as P slices or B slices. Motion estimationunit 42 and motion compensation unit 44 may be highly integrated, butare illustrated separately for conceptual purposes. Motion estimation,performed by motion estimation unit 42, is the process of generatingmotion vectors, which estimate motion for video blocks. A motion vector,for example, may indicate the displacement of a PU of a video blockwithin a current video picture relative to a predictive block within areference picture.

A predictive block is a block that is found to closely match the PU ofthe video block to be coded in terms of pixel difference, which may bedetermined by sum of absolute difference (SAD), sum of square difference(SSD), or other difference metrics. In some examples, video encoder 20may calculate values for sub-integer pixel positions of referencepictures stored in decoded picture buffer 64. For example, video encoder20 may interpolate values of one-quarter pixel positions, one-eighthpixel positions, or other fractional pixel positions of the referencepicture. Therefore, motion estimation unit 42 may perform a motionsearch relative to the full pixel positions and fractional pixelpositions and output a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in decoded picture buffer 64. Motionestimation unit 42 sends the calculated motion vector to entropyencoding unit 56 and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation, possibly performinginterpolations to sub-pixel precision. Upon receiving the motion vectorfor the PU of the current video block, motion compensation unit 44 maylocate the predictive block to which the motion vector points in one ofthe reference picture lists. Video encoder 20 forms a residual videoblock by subtracting pixel values of the predictive block from the pixelvalues of the current video block being coded, forming pixel differencevalues. The pixel difference values form residual data for the block,and may include both luma and chroma difference components. Summer 50represents the component or components that perform this subtractionoperation. Motion compensation unit 44 may also generate syntax elementsassociated with the video blocks and the video slice for use by videodecoder 30 in decoding the video blocks of the video slice.

Intra-prediction unit 46 may intra-predict a current block, as analternative to the inter-prediction performed by motion estimation unit42 and motion compensation unit 44, as described above. In particular,intra-prediction unit 46 may determine an intra-prediction mode to useto encode a current block. In some examples, intra-prediction unit 46may encode a current block using various intra-prediction modes, e.g.,during separate encoding passes, and intra-prediction unit 46 (or modeselect unit 40, in some examples) may select an appropriateintra-prediction mode to use from the tested modes. For example,intra-prediction unit 46 may calculate rate-distortion values using arate-distortion analysis for the various tested intra-prediction modes,and select the intra-prediction mode having the best rate-distortioncharacteristics among the tested modes. Rate-distortion analysisgenerally determines an amount of distortion (or error) between anencoded block and an original, unencoded block that was encoded toproduce the encoded block, as well as a bit rate (that is, a number ofbits) used to produce the encoded block. Intra-prediction unit 46 maycalculate ratios from the distortions and rates for the various encodedblocks to determine which intra-prediction mode exhibits the bestrate-distortion value for the block.

After selecting an intra-prediction mode for a block, intra-predictionunit 46 may provide information indicative of the selectedintra-prediction mode for the block to entropy encoding unit 56. Entropyencoding unit 56 may encode the information indicating the selectedintra-prediction mode in accordance with the techniques of thisdisclosure. Video encoder 20 may include in the transmitted bitstreamconfiguration data, which may include a plurality of intra-predictionmode index tables and a plurality of modified intra-prediction modeindex tables (also referred to as codeword mapping tables), definitionsof encoding contexts for various blocks, and indications of a mostprobable intra-prediction mode, an intra-prediction mode index table,and a modified intra-prediction mode index table to use for each of thecontexts.

After prediction processing unit 41 generates the predictive block forthe current video block via either inter-prediction or intra-prediction,video encoder 20 forms a residual video block by subtracting thepredictive block from the current video block. The residual video datain the residual block may be included in one or more TUs and applied totransform processing unit 52. Transform processing unit 52 transformsthe residual video data into residual transform coefficients using atransform, such as a discrete cosine transform (DCT) or a conceptuallysimilar transform. Transform processing unit 52 may convert the residualvideo data from a pixel domain to a transform domain, such as afrequency domain.

Transform processing unit 52 may send the resulting transformcoefficients to quantization unit 54. Quantization unit 54 quantizes thetransform coefficients to further reduce bit rate. The quantizationprocess may reduce the bit depth associated with some or all of thecoefficients. The degree of quantization may be modified by adjusting aquantization parameter. In some examples, quantization unit 54 may thenperform a scan of the matrix including the quantized transformcoefficients. Alternatively, entropy encoding unit 56 may perform thescan.

Following quantization, entropy encoding unit 56 entropy encodes thequantized transform coefficients. For example, entropy encoding unit 56may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy encoding methodology ortechnique. Following the entropy encoding by entropy encoding unit 56,the encoded bitstream may be transmitted to video decoder 30, orarchived for later transmission or retrieval by video decoder 30.Entropy encoding unit 56 may also entropy encode the motion vectors andthe other syntax elements for the current video slice being coded.

Inverse quantization unit 58 and inverse transform unit 60 apply inversequantization and inverse transformation, respectively, to reconstructthe residual block in the pixel domain for later use as a referenceblock of a reference picture. Motion compensation unit 44 may calculatea reference block by adding the residual block to a predictive block ofone of the reference pictures within one of the reference picture lists.Motion compensation unit 44 may also apply one or more interpolationfilters to the reconstructed residual block to calculate sub-integerpixel values for use in motion estimation. Summer 62 adds thereconstructed residual block to the motion compensated prediction blockproduced by motion compensation unit 44 to produce a reference block forstorage in decoded picture buffer 64. The reference block may be used bymotion estimation unit 42 and motion compensation unit 44 as a referenceblock to inter-predict a block in a subsequent video frame or picture.

In accordance with this disclosure, prediction processing unit 41represents one example unit for performing the example functionsdescribed above. For example, prediction processing unit 41 may encodesyntax elements that support the use of adaptive resolution CVSs.Prediction processing unit 41 may also generate SPSs that may beactivated by one or more resolution sub-sequences, and transmit the SPSsand RSSs to a video decoder. Each of the SPSs may include resolutioninformation for one or more sequences of pictures. Prediction processingunit 41 may also receive and order one or more SPSs and cause videoencoder 20 to code information indicative of the reference pictures thatbelong to the reference picture set. In addition, DPB management unit 65may also perform techniques related to the management of DPB 64.

Also, during the reconstruction process (e.g., the process used toreconstruct a picture for use as a reference picture and storage in DPB64), prediction processing unit 41 may construct the plurality ofreference picture subsets that each identifies one or more of thereference pictures. Prediction processing unit processing 41 may alsoderive the reference picture set from the constructed plurality ofreference picture subsets. Also, prediction processing unit 41 and DPBmanagement unit 65 may implement any one or more of the sets of examplepseudo code described below to implement one or more example techniquesdescribed in this disclosure.

In accordance with the techniques of this disclosure, predictionprocessing unit 41 may generate a coded video sequence comprising afirst sub-sequence and a second sub-sequence, wherein the firstsub-sequence includes one or more frames each having a first resolution.The second sub-sequence may include one or more frames each having asecond resolution. The first sub-sequence may be different than thesecond sub-sequence, and the first resolution may be different than thesecond resolution. Prediction processing unit 41 may further generate afirst sequence parameter set and a second sequence parameter set for thevideo sequence. The first sequence parameter set may indicate the firstresolution of the one or more frames of the first sub-sequence, and thesecond sequence parameter set may indicate the second resolution of theone or more frames of the second sub-sequence. Also, the first sequenceparameter set may be different than the second sequence parameter set.Prediction processing unit 41 may transmit the coded video sequencecomprising the first sub-sequence and the second sub-sequence, and thefirst sequence parameter set and the second sequence. In some examples,the resolution may comprise a spatial resolution.

Prediction processing unit 41 may also alter the coding of the sequenceparameters sets. For example, prediction processing unit 41 may code thefirst sequence parameter set and the second sequence parameter in atransmitted bitstream prior to either the first sub-sequence or thesecond sub-sequence. Prediction processing unit 41 may also interleavein the coded video sequence the one or more frames of the firstsub-sequence and the one or more frames of the second sub-sequence.

In some examples, to transmit the first sequence parameter set and thesecond sequence parameter set of the coded video sequence, predictionprocessing unit 41 may be configured to transmit both the first sequenceparameter set and the second sequence parameter set prior totransmitting either of the first sub-sequence and the secondsub-sequence. In another example, to transmit the first sequenceparameter set and the second sequence parameter set of the coded videosequence, prediction processing unit 41 may be configured to transmitthe second sequence parameter set after transmitting at least one frameof the one or more frames of the first sub-sequence, and prior totransmitting the second sub-sequence.

In some examples, prediction processing unit 41 may code the firstsequence parameter set in a transmitted bitstream prior to coding thefirst sub-sequence and prediction processing unit 41 may also code thesecond sequence parameter set in the transmitted bitstream after atleast one frame of the one or more frames of the first sub-sequence andprior to the second sub-sequence.

Decoded picture buffer 64, decoded picture buffer management unit 65 andvideo encoder 20 may also perform the techniques of this disclosure. Insome examples, decoded picture buffer 64 may receive a first decodedframe of video data, wherein the first decoded frame is associated witha first resolution, determine whether a decoded picture buffer isavailable to store the first decoded frame based on the firstresolution, and in the event the decoded picture buffer is available tostore the first decoded frame, store the first decoded frame in thedecoded picture buffer, and determine whether decoded buffer 64 isavailable to store a second decoded frame of video data, wherein thesecond decoded frame is associated with a second resolution, based onthe first resolution and the second resolution, wherein the firstdecoded frame is different than the second decoded frame.

In some additional examples, DPB management unit 65 may determine anamount of information that may be stored within decoded picture buffer64, determine an amount of information associated with the first decodedframe based on the first resolution, and compare the amount ofinformation that may be stored within decoded picture buffer 64, and theamount of information associated with the first decoded frame.

In one example, to determine whether decoded picture buffer 64 isavailable to store the second decoded frame based on the firstresolution and the second resolution, DPB management unit 65 may beconfigured to determine an amount of information that may be storedwithin decoded picture buffer 64 based on the first resolution,determine an amount of information associated with the second decodedframe based on the second resolution, and compare the amount ofinformation that may be stored within decoded picture buffer 64 and theamount of information associated with the second decoded frame. DPBmanagement unit 65 may also be configured to remove the first decodedframe from decoded picture buffer 64, and in some examples, theresolution may comprise a spatial resolution.

The techniques described in this disclosure may refer to video encoder20 signaling information. When video encoder 20 signals information, thetechniques of this disclosure generally refer to any manner in whichvideo encoder 20 provides the information in a coded bitstream. Forexample, when video encoder 20 signals syntax elements to video decoder30, it may mean that video encoder 20 transmitted the syntax elements tovideo decoder 30 as part of a coded bitstream via output interface 24and communication channel 16, or that video encoder 20 stored the syntaxelements in a coded bitstream on storage medium 17 and/or file server 19for eventual reception by video decoder 30. In this way, signaling fromvideo encoder 20 to video decoder 30 should not be interpreted asrequiring transmission directly from video encoder 20 to video decoder30, although this may be one possibility for real-time videoapplications. In other examples, however, signaling from video encoder20 to video decoder 30 should be interpreted as any technique with whichvideo encoder 20 provides information in a bitstream for eventualreception by video decoder 30, either directly or via an intermediatestorage (e.g., in storage medium 17 and/or file server 19).

Video encoder 20 and video decoder 30 may be configured to implement theexample techniques described in this disclosure for coding,transmitting, receiving and activating SPSs and RSSs, as well as formanaging the DPB. For example, video decoder 30 may invoke thetechniques to support adaptive resolution CVSs and to add and removereference pictures from the DPB. Video decoder 30 may invoke the processin a similar manner.

To support SPSs in a single adaptive-resolution CVS, predictionprocessing unit 41 may utilize RSSs. Each RSS may indicate information,such as a resolution of a series of coded video pictures of a CVS.Prediction processing unit 41 may use one resolution sub-sequence (RSS)at given time. Each RSS may reference a single SPS. As an example, ifthere are “n” RSSs in a given CVS, there may be, altogether, “n” activeSPSs when decoding the CVS. However, in some examples, multiple RSSs mayrefer to a single SPS in a CVS. The SPS or PPS may indicate thedifferent resolution of each RSS. The SPS or PPS may include aresolution ID as well as a syntax element that indicates the resolutionassociated with each resolution ID.

In accordance with the techniques of this disclosure, acomputer-readable storage medium may include a data structure thatrepresents CVSs, SPSs, and RSSs. In particular, the data structure mayinclude a coded video sequence comprising a first sub-sequence and asecond sub-sequence. The first sub-sequence may include one or moreframes each having a first resolution, and the second sub-sequence mayinclude one or more frames each having a second resolution. The firstsub-sequence may also be different than the second sub-sequence, and thefirst resolution may be different than the second resolution. The datastructure may further comprise a first sequence parameter set and asecond sequence parameter set for the coded video sequence. The firstsequence parameter set may indicate the first resolution of the one ormore frames of the first sub-sequence, the second sequence parameter setmay indicate the second resolution of the one or more frames of thesecond sub-sequence, and the first sequence parameter set may bedifferent than the second sequence parameter set.

Prediction processing unit 41 of video encoder 20 may order or restricteach of the RSSs according to spatial resolution characteristics of eachRSS. In general, prediction processing unit 41 may order the SPSs basedon their horizontal resolutions. As an example, if a horizontal size ofa resolution “A” of an SPS is greater than that of a resolution “B” ofan SPS, a vertical size of the resolution “A” may not be less than thatof the resolution “B.” With this restriction, a resolution “C” of an SPSmay be considered to be larger than a resolution “D” of an SPS as longas one of a horizontal size and a vertical size of the resolution “C” isgreater than a corresponding size of the resolution “D.” Video encoder20 may assign an RSS with a largest spatial resolution a resolution IDequal to “0,” and an RRS with a second largest spatial resolution aresolution ID equal to “1,” and so forth.

In some examples, prediction processing unit 41 may not signal aresolution ID. Rather, video encoder 20 may derive the resolution IDaccording to the spatial resolutions of the RSSs. Prediction processingunit 41 may still order each of the RSSs in each CVS according to thespatial resolutions of each RSS, as described above. The RSS with thelargest spatial resolution is assigned a resolution ID equal to 0, andthe RSS with the second largest spatial resolution is assigned aresolution ID equal to 1, and so on.

For any RSS with a resolution ID equal to “rId,” duringinter-prediction, prediction processing unit 41 may refer to decodedpictures only within the same RSS, within an RSS with a resolution IDequal to “rId−1,” or within an RSS with a resolution ID equal to“rId+1.” Prediction processing unit 41 may not refer to decoded pictureswithin other RSSs when performing inter-prediction.

In some examples, there may be additional restrictions oninter-prediction amongst RSSs. In one instance, an RSS predictionprocessing unit 41 may only perform inter-prediction of blocks from twoadjacent RSSs, i.e., the RSS with the immediately larger spatialresolution and the RSS with the immediately smaller spatial resolution.In another example, prediction processing unit 41 may not be limited toperforming inter-prediction using spatially-neighboring RSSs, andprediction processing unit 41 may perform inter-prediction using anyRSS, not just spatially neighboring RSSs (e.g., RSSs with rId+1 orrId−1).

The techniques of this disclosure may also include processes andtechniques for transmitting and activating picture parameters sets(PPSs). The use of PPSs may decouple the transmission of infrequentlychanging information from the transmission of coded block data for theCVSs. Video encoder 20 and decoder 30 may, in some applications conveyor signal the SPSs and PPSs “out-of-band,” or using a differentcommunication channel than that used to communicate the coded block dataof the CVSs, e.g., using a reliable transport mechanism.

A PPS raw byte sequence payload (RBSP) may include parameters to whichcoded slice network abstraction layer (NAL) units of one or more codedpictures may refer. Each PPS RBSP is initially considered not active ata start of a decoding process. At most, one PPS RBSP is consideredactive at any given moment during the decoding process, and activationof any particular PPS RBSP results in deactivation of apreviously-active PPS RBSP, if any.

In some examples, prediction processing unit 41 of video encoder 20 andprediction processing unit 81 of video decoder 30 may support RSSs eachhaving the same resolution aspect ratio. In other examples, videoencoder 20 and decoder 30 may support different RSSs having differentresolution aspect ratios among the different RSSs. The resolution aspectratio of an RSS may be defined as the proportion of the width of an RSSversus the height of the RSS.

In the example where prediction processing units 41 and 81 support RSSshaving different resolution aspect ratios, prediction processing units41 and 81 may crop a portion of a block of a reference picture having afirst resolution aspect ratio in order to predict the values of apredictive block having a second, different resolution aspect ratio. Thetechniques of this disclosure define a number of syntax elements,referred to as cropping parameters, which may be signaled in the RBSP ofan SPS to indicate how a reference picture should be cropped. Thecropped area of the reference picture may be referred to as a “croppingwindow.”

In order to support CVSs with adaptive-resolution, the techniques ofthis disclosure propose adding following syntax structures to the SPS.The syntax elements may include a profile indicator or a flag thatindicates the existence of more than one spatial resolution in the CVS.Alternatively no flag may be added, but the existence of the more thanone spatial resolution in the CVS may be indicated by a particular valueof the profile indicator, which may be denoted as profile_idc.Additionally, the syntax elements may include a resolution ID, a syntaxelement that indicates a spatial relationship between the currentresolution sub-sequence and an adjacent spatial resolution sub-sequence,and a syntax element that indicates the required size of the DPB inunits of 8×8 blocks.

According to the techniques of this disclosure, a modified SPS RBSPsyntax structure may be expressed as shown below in Table I:

TABLE I seq_parameter_set_rbsp( ) { Descriptor  profile_idc u(8) reserved_zero_8bits /* equal to 0 */ u(8)  level_idc u(8) seq_parameter_set_id ue(v)  max_temporal_layers_minus1 u(3) pic_width_in_luma_samples u(16)  pic_height_in_luma_samples u(16) bit_depth_luma_minus8 ue(v)  bit_depth_chroma_minus8 ue(v) pcm_bit_depth_luma_minus1 u(4)  pcm_bit_depth_chroma_minus1 u(4) log2_max_pic_order_cnt_lsb_minus4 ue(v)  max_num_ref_frames ue(v) log2_min_coding_block_size_minus3 ue(v) log2_diff_max_min_coding_block_size ue(v) log2_min_transform_block_size_minus2 ue(v) log2_diff_max_min_transform_block_size ue(v) log2_min_pcm_coding_block_size_minus3 ue(v) max_transform_hierarchy_depth_inter ue(v) max_transform_hierarchy_depth_intra ue(v) chroma_pred_from_luma_enabled_flag u(1)  loop_filter_across_slice_flagu(1)  sample_adaptive_offset_enabled_flag u(1) adaptive_loop_fiter_enabled_flag u(1)  pcm_loop_filter_disable_flagu(1)  cu_qp_delta_enabled_flag u(1)  temporal_id_nesting_flag u(1) inter_4x4_enabled_flag u(1)  adaptive_spatial_resolution_flag u(1)  if(adaptive_spatial_resolution_flag ) {   resolution_id ue(v)   for ( i =0; i < 2; i++) {    cropping_resolution_idc[ i ] u(2)    if(cropping_resolution_idc[ i ] & 0x01) {     cropped_left[ i ] ue(v)    cropped_right[ i ] ue(v)    }    if (cropping_resolution_idc[ i ] &0x10) {     cropped_top[ i ] ue(v)     cropped_bottom[ i ] ue(v)    }  }  }  max_dec_pic_buffering ue(v)  rbsp_trailing_bits( ) }

An exemplary description of the new SPS syntax elements in Table I isset forth in more detail below.

adaptive_spatial_resolution_flag: When equal to “1,” the flag indicatesthat a CVS containing an RSS referring to an SPS may contain pictureswith different spatial resolutions. When equal to “0,” the flagindicates that all pictures in the CVS have a same spatial resolution,or equivalently, that there is only one RSS in the CVS. This syntaxelement applies to the entire CVS, and its value shall be identical forall SPSs that may be activated for the CVS.

The adaptive_spatial_resolution flag is only one example of how adaptiveresolution CVSs may be implemented. As another example, there may be oneor more profiles defined that enable adaptive spatial resolution.Accordingly, the value of the profile_idc syntax element, which mayindicate the selection of an adaptive resolution profile, may signal theenablement of adaptive resolution.

resolution_id: Specifies an identifier of the RSS referring to the SPS.A value of resolution_id may be in a range of “0” to “7,” inclusive. AnRSS with a largest spatial resolution among all RSSs in the CVS may haveresolution_id equal to “0.”

cropping_resolution_idc[i]: Indicates whether cropping is needed tospecify a reference region of a reference picture from a target RSS, asdefined below, used for inter-prediction as a reference when decoding acoded picture from a current RSS.

The pseudocode that follows describes one example of how the numberingof an RSS using the resolution_id value that refers to an SPS may beimplemented according to the techniques of this disclosure.

-   -   Let “rId” be a resolution_id of the current RSS;    -   The target RSS is the RSS with a resolution_id equal to:        rId+(i==0?−1:1);    -   If the current RSS has a resolution_id equal to 0,        cropping_resolution_idc[0]=0    -   If the current RSS has a largest resolution_id among all RSSs in        the CVS, cropping_resolution_idc[1]=0

As described above, the techniques of this disclosure may enable RSSsand SPSs that may have different aspect ratios. When performinginter-prediction, video encoder 20 may predict the pixel values of ablock from a block of a reference picture that has a different aspectratio. Because of the difference in the aspect ratios, video encoder 20may crop the portion of the block of the reference block in order toobtain a block with a similar resolution aspect ratio to the predictiveblock. The following syntax elements describe how video encoder 20 mayperform cropping of blocks to obtain blocks with different resolutionaspect ratios.

Cropping_resolution_idc[i] equal to “0” indicates that the target RSSdoes not exist, or that no cropping is needed.

Cropping_resolution_idc[i] equal to “1” indicates that cropping at aleft and/or right side is needed.

Cropping_resolution_idc[i] equal to “2” indicates that cropping at a topand/or bottom is needed.

Cropping_resolution_idc[i] equal to “3” indicates that cropping at boththe left/right and the top/bottom is needed.

Table II below illustrates the various values ofCropping_resolution_idc[i], and the corresponding indications.

TABLE II cropping_resolution_idc[ i ] 0 No cropping is needed 1 Croppingmay happen at the left and/or right side 2 Cropping may happen at thetop and/or bottom 3 Cropping may happen at both left/right andtop/bottom

In addition to “cropping_resolution_idc” value, the RBSP of an SPS mayalso include syntax elements that may indicate the number of pixels tobe cropped from the top, bottom, left, and/or right of a referencepicture from an RSS. These additional cropping syntax elements aredescribed in further detail below.

cropped_left[i]: Specifies a number of pixels to be cropped at a leftside of a luma component of the reference picture from the target RSS,to specify the reference region. When not present, video encoder 20 mayinfer the value to be equal to “0.”

cropped_right[i]: Specifies a number of pixels to be cropped at a rightside of the luma component of the reference picture from the target RSS,to specify the reference region. When not present, video encoder 20 mayinfer the value to be equal to “0.”

cropped_top[i]: Specifies a number of pixels to be cropped at a top ofthe luma component of the reference picture from the target RSS, tospecify the reference region. When not present, video encoder 20 mayinfer the value to be equal to “0.”

cropped_bottom[i]: Specifies a number of pixels to be cropped at abottom of the luma component of the reference picture from the targetRSS, to specify the reference region. When not present, video encoder 20may infer the value to be equal to “0.”

In addition to signaling a bottom, top, left, and/or right cropping,video encoder 20 may signal the cropping window in other ways. As anexample, video encoder 20 may signal the cropping window as the startingvertical and horizontal positions plus the width and height. As anotherexample, video encoder 20 may signal the cropping window as the startingvertical and horizontal positions and the ending vertical and horizontalpositions.

Before prediction processing unit 41 may use a coded picture in thecurrent RSS, prediction processing unit 41 may crop a decoded picturefrom the target RSS as specified by the above cropping syntax elements.prediction processing unit 41 may also scale the cropped referencepicture to be the same resolution as the coded picture in the currentRSS, and scale the motion vectors of the cropped block accordingly.

As described above, video encoder 20 may each include DPB 64 that maycontain decoded pictures. DPB management units 65 may manage DPB 64.Each decoded picture contained within DPB 64 may be needed for eitherinter-prediction as a reference, or for future output. In accordancewith the techniques of this disclosure, DPB 64 may be modified tosupport adaptive-resolution CVSs, and more generally to store frames ofdifferent sizes.

In accordance with the techniques of this disclosure, prior toinitialisation, the DPB may empty (i.e., an indication of a proportionof DPB 64 that is unavailable to store decoded pictures, or DPB“fullness,” is set to “0”). When a decoded picture is stored in DPB 64,DPB management unit 65 may increment the “fullness” of the DPB by thenumber of blocks (e.g., CUs or 8×8 pixel blocks) in the picture.Similarly, when DPB management unit 65 removes a decoded picture fromDPB 64, DPB management unit 65 may decrease the fullness of the DPB bythe number of blocks (e.g., CUs or 8×8 pixel blocks) in the removedpicture.

To support a DPB that utilizes a count a block count rather than a framecount to indicate the “fullness” of the DPB, the RBSP of an SPS mayinclude a syntax element that specifies a size of the DPB in 8×8 blocks.The parameter, denoted as max_dec_pic_buffering, specifies a requiredsize of a decoded picture buffer (DPB), in units of 8×8 blocks, fordecoding the CVS. This syntax element may apply to the entire CVS, andits value is identical for all SPSs that may be activated for the CVS.Further detail of the operation of the DPB is described with respect toFIG. 5, below.

FIG. 3 is a block diagram illustrating an example video decoder 30 thatmay implement the techniques described in this disclosure. In theexample of FIG. 3, video decoder 30 includes an entropy decoding unit80, prediction processing unit 81, inverse quantization unit 86, inversetransformation unit 88, summer 90, decoded picture buffer (DPB) 92, andDBP management unit 93. Prediction processing unit 81 includes motioncompensation unit 82 and intra prediction unit 84. Video decoder 30 may,in some examples, perform a decoding pass generally reciprocal to theencoding pass described with respect to video encoder 20 from FIG. 2.

During the decoding process, video decoder 30 receives an encoded videobitstream that represents video blocks of an encoded video slice andassociated syntax elements from video encoder 20. Entropy decoding unit80 of video decoder 30 entropy decodes the bitstream to generatequantized coefficients, motion vectors, and other syntax elements.Entropy decoding unit 80 forwards the motion vectors and other syntaxelements to prediction processing unit 81. Video decoder 30 may receivethe syntax elements at the video slice level and/or the video blocklevel.

When the video slice is coded as an intra-coded (I) slice, intraprediction unit 84 of prediction processing unit 81 may generateprediction data for a video block of the current video slice based on asignaled intra prediction mode and data from previously decoded blocksof the current picture. When the video picture is coded as aninter-coded (i.e., B or P) slice, motion compensation unit 82 ofprediction processing unit 81 produces predictive blocks for a videoblock of the current video slice based on the motion vectors and othersyntax elements received from entropy decoding unit 80. The predictiveblocks may be produced from one of the reference pictures within one ofthe reference picture lists. Video decoder 30 may construct thereference frame lists, List 0 and List 1, using default constructiontechniques based on reference pictures stored in decoded picture buffer92. In some examples, video decoder 30 may construct List 0 and List 1from the reference pictures identified in the derived reference pictureset.

Motion compensation unit 82 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 82 uses some of the received syntaxelements to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice or P slice), constructioninformation for one or more of the reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 82 may also perform interpolation based oninterpolation filters. Motion compensation unit 82 may use interpolationfilters as used by video encoder 20 during encoding of the video blocksto calculate interpolated values for sub-integer pixels of referenceblocks. In this case, motion compensation unit 82 may determine theinterpolation filters used by video encoder 20 from the received syntaxelements and use the interpolation filters to produce predictive blocks.

Inverse quantization unit 86 inverse quantizes, i.e., de quantizes, thequantized transform coefficients provided in the bitstream and decodedby entropy decoding unit 80. The inverse quantization process mayinclude use of a quantization parameter calculated by video encoder 20for each video block in the video slice to determine a degree ofquantization and, likewise, a degree of inverse quantization that shouldbe applied. Inverse transform unit 88 applies an inverse transform,e.g., an inverse DCT, an inverse integer transform, or a conceptuallysimilar inverse transform process, to the transform coefficients inorder to produce residual blocks in the pixel domain.

After prediction processing unit 81 generates the predictive block forthe current video block based on either inter- or intra-prediction,video decoder 30 forms a decoded video block by summing the residualblocks from inverse transform unit 88 with the corresponding predictiveblocks generated by prediction processing unit 81. Summer 90 representsthe component or components that perform this summation operation. Ifdesired, a deblocking filter may also be applied to filter the decodedblocks in order to remove blockiness artifacts. Other loop filters(either in the coding loop or after the coding loop) may also be used tosmooth pixel transitions, or otherwise improve the video quality. DPBmanagement unit 93 may store the decoded video blocks of a given indecoded picture buffer 92, which stores reference pictures used forsubsequent motion compensation. Decoded picture buffer 92 also storesdecoded video for later presentation on a display device, such asdisplay device 32 of FIG. 1.

In accordance with this disclosure, prediction processing unit 81 andDPB management unit 93 represent example units for performing theexample functions described above. For example, prediction processingunit 81 may receive a coded video sequence comprising a firstsub-sequence and a second sub-sequence, wherein the first sub-sequenceincludes one or more frames each having a first resolution, and thesecond sub-sequence includes one or more frames each having a secondresolution, and wherein the first sub-sequence is different than thesecond sub-sequence, and the first resolution is different than thesecond resolution. Prediction processing unit 81 may also receive afirst sequence parameter set and a second sequence parameter set for thecoded video sequence, wherein the first sequence parameter set indicatesthe first resolution of the one or more frames of the firstsub-sequence, and the second sequence parameter set indicates the secondresolution of the one or more frames of the second sub-sequence, andwherein the first sequence parameter set is different than the secondsequence parameter set. Prediction processing unit 81 may also use thefirst sequence parameter set and the second sequence parameter set todecode the coded video sequence.

As another example in accordance with the techniques of this disclosure,prediction processing unit 81 may also receive a first decoded frame ofvideo data, wherein the first decoded frame is associated with a firstresolution. DPB management unit 93 may determine whether DPB 92 isavailable to store the first decoded frame based on the firstresolution, and in the event the decoded picture buffer is available tostore the first decoded frame, store the first decoded frame in DPB 92,and determine whether the DPB 93 is available to store a second decodedframe of video data, wherein the second decoded frame is associated witha second resolution, based on the first resolution and the secondresolution, wherein the first decoded frame is different than the seconddecoded frame.

In general, video decoder 30 may perform any of the techniques of thisdisclosure. In some examples, video decoder 30 may perform some or allof the techniques described above with respect to video encoder 20 inFIG. 2. In some examples, video decoder 30 may perform the techniquesdescribed with respect to FIG. 2 in a reciprocal ordering or manner tothat described with respect to video encoder 20.

FIGS. 4A-4D are conceptual diagrams that illustrate examples of a codedbitstream including coded video data in accordance with the techniquesof this disclosure. As shown in FIG. 4A, a coded bitstream 400 maycomprise one or more coded video sequences (CVSs), in particular, CVS402 and CVS 404. As also shown in FIG. 4A, each of CVS 402 and CVS 404may comprise one or more frames, or “pictures,” PIC_1 (0)-PIC_1 (N), andPIC_2 (0)-PIC_2 (M), respectively. As still further shown in FIG. 4A,each of CVS 402 and CVS 404 may further comprise a single sequenceparameter set (SPS), in particular, SPS1 and SPS2, respectively. Asdescribed above, each of SPS1 and SPS2 may define parameters for thecorresponding one of CVS 402 and CVS 404, including LCU size, SCU size,and other syntax information for the respective CVS that is common toall frames, or “pictures” within the CVS.

As shown in FIG. 4B, a particular CVS, CVS 406, may further comprise oneor more picture parameter sets (PPSs), in particular, PPS1 and PPS2. Asdescribed above, each of PPS1 and PPS2 may define parameters for CVS406, including syntax information that indicates picture resolution,that are common to one or more pictures within CVS 406, but not to allpictures within CVS 406. For example, syntax information included withineach of PPS1 and PPS2, e.g., picture resolution syntax information, mayapply to a sub-set of the pictures included within CVS 406. As oneexample, PPS1 may indicate picture resolution for PIC_1 (0)-PIC_1 (N),and PPS2 may indicate picture resolution for PIC_2 (0)-PIC_2 (M).Accordingly, CVS 406 may comprise pictures having different resolutions,wherein picture resolution for a particular one or more pictures (e.g.,PIC_1 (0)-PIC_1 (N)) within CVS 406 that share a common pictureresolution may be specified by a corresponding one of PPS1 and PPS2.

In cases where pictures having different resolutions are alternatedwithin a CVS in a decoding order, e.g., in a resolution-adaptive CVS, aPPS may have to be signaled prior to each picture having a differentpicture resolution relative to a previous picture in the decoding order,to indicate the picture resolution for the currently decoded picture.Accordingly, in such cases, multiple PPSs may need to be signaledthroughout decoding the CVS, which may increase coding overhead.

As described above, A PPS RBSP may include parameters that can bereferred to by coded slice NAL units of one or more coded pictures. EachPPS RBSP is initially considered not active at a start of a decodingprocess. In most examples, one PPS RBSP is considered active at anygiven moment during the decoding process, and activation of anyparticular PPS RBSP results in deactivation of a previously-active PPSRBSP, if any.

When a PPS RBSP (with a particular value of the pic_parameter_set_idsyntax element) is not active, and is referred to by a coded slice NALunit (using the particular value of pic_parameter_set_id), the PPSreferred to by the pic_parameter_sed_id is activated. This PPS RBSP isreferred to as an “active PPS RBSP,” until it is deactivated by anactivation of another PPS. Video encoder 20 or decoder 30 may require aPPS with the referenced pic_parameter_set_id, value to have beenreceived before activating that PPS with that pic_parameter_set_id.

As an example of the PPS activation process, a NAL unit may refer toPPS1. Video encoder 20 or decoder 30 may activate PPS1 based on thereference to PPS1 in the NAL unit. PPS1 is the active PPS RBSP. PPS1remains the active PPS RBSP until a NAL unit references PPS2, at whichpoint video encoder 20 or decoder 30 may activate PPS2. Once activated,PPS2 becomes the active PPS RBSP, and PPS1 is no longer the active PPSRBSP.

Any PPS NAL unit that has the same pic_parameter_set_id value for theactive PPS RBSP for a coded picture may have the same content as that ofthe active PPS RBSP for the coded picture. That is, if thepic_parameter_set_id of the PPS NAL is the same as that of the activePPS RBSP, the content of the active PPS RBSP may not change. There maybe an exception to this rule, however. If a PPS NAL has the samepic_parameter_set_id as the active PPS RBSP, and the PPS NAL follows thelast Video Coding Layer (VCL) NAL unit of the coded picture, andprecedes the first VCL NAL unit of another coded picture, then thecontent of the active PPS RBSP may change (e.g., thepic_parameter_set_id value may indicate a different set of parameters).

In accordance with the techniques of this disclosure, as shown in FIGS.4C-4D, syntax information that indicates picture resolution for one ormore pictures within a CVS, wherein the CVS comprises one or morepictures having different sizes, may be indicated using multiple SPSsfor the CVS, rather than using a plurality of PPSs, as described abovewith reference to FIGS. 4A-4B.

A SPS RBSP may include parameters that can be referred to by one or morePPS RBSPs, or one or more Supplemental Extension Information (SEI) NALunits containing a buffering period SEI message. Each SPS is initiallyconsidered not active at a start of a decoding process. At most, one SPSmay be considered active for each RSS at any given moment during thedecoding process, and the activation of any particular SPS may result ina deactivation of a previously-active SPS for the same resolutionsub-sequence, if any. Also, if there are “n” resolution sub-sequenceswithin the CVS, at most “n” SPS RBSPs may be considered active for theentire CVS at any given moment during the decoding process.

When an SPS RBSP (with a particular value of seq_parameter_set_id) isnot already active, and is referred to by activation of a PPS RBSP(using the particular value of seq_parameter_set_id), or is referred toby an SEI NAL unit containing a buffering period SEI message (using theparticular value of seq_parameter_set_id), the SPS RBSP is activated.This SPS RBSP may be referred to as an “active SPS RBSP” for theassociated RSS (the RSS in which the coded pictures refers to the activeSPS RBSP through the PPS RBSPs), until it is deactivated by anactivation of another SPS RBSP. Video encoder 20 or decoder 30 mayrequire the SPS RBSP with a particular value of seq_parameter_set_id, tobe available to video encoder 20 or video decoder 30 prior to theactivation of that SPS. Additionally, the SPS may remain active for theentire RSS in the CVS.

Additionally, because an instantaneous decoder refresh (IDR) access unitbegins a new CVS, and an activated SPS RBSP may remain active for theentire RSS in the CVS, an SPS RBSP may only be activated by a bufferingperiod SEI message when the buffering period SEI message is part of anIDR access unit.

Any SPS NAL unit containing the particular value of seq_parameter_set_idfor the active SPS RBSP for a RSS in a CVS may have the same content asthat of the active SPS RBSP for the RSS in the CVS, unless it follows alast access unit of the CVS, and precedes the first VCL NAL unit and thefirst SEI NAL unit containing a buffering period SEI message (whenpresent) of another CVS.

Also, if a PPS RBSP or an SPS RBSP is conveyed within the bitstream,these constraints impose an order constraint on the NAL units thatcontain the PPS RBSP or the SPS RBSP, respectively. Otherwise if PPSRBSP or SPS RBSP are conveyed by other means not specified in thisdisclosure, they should be available to the decoding process in a timelyfashion such that these constraints are obeyed.

The constraints that are expressed on the relationship between thevalues of the syntax elements (and the values of variables derived fromthose syntax elements) in SPS and PPS, and other syntax elements, aretypically expressions of constraints that apply only to the active SPSand the active PPS. If any SPS RBSP is present that is not activated inthe bitstream, its syntax elements usually have values that wouldconform to the specified constraints if it were activated by referencein an otherwise conforming bitstream. If any PPS RBSP is present that isnot ever activated in the bitstream, the syntax elements of the PPS RBSPmay have values that would conform to the specified constraints if thePPS were activated by reference in an otherwise-conforming bitstream.

During the decoding process, the values of parameters of the active PPSand the active SPS may be considered to be in effect. For interpretationof SEI messages, the values of the parameters of the PPS and SPS thatare active for the operation of the decoding process for the VCL NALunits of the primary coded picture in the same access unit may beconsidered in effect unless otherwise specified in the SEI messagesemantics.

As one example, as shown in FIG. 4C, CVS 408 may include one or moreSPSs, in particular, SPS1 and SPS2, that each indicate pictureresolution for PIC_1 (0), PIC_1 (1), etc., and PIC_2 (0), PIC_2 (1),etc., respectively. In other words, SPS1 indicates picture resolutioninformation for PIC_1 (0), PIC_1 (1), etc., and SPS2 indicates pictureresolution information for PIC_2 (0), PIC_2 (1), etc. In this example,CVS 408 may further comprise one or more PPSs (not shown), wherein theone or more PPSs may specify syntax information for one or more picturesof CVS 408, but wherein the one or more PPSs do not include any syntaxinformation that indicates picture resolution for any of the one or morepictures of CVS 408.

In this example, SPS1 and SPS2 may indicate picture resolutioninformation for all pictures within CVS 408, even in cases wherepictures having different resolutions are alternated within a CVS in thedecoding order. Accordingly, after the indicating picture resolutioninformation for all pictures within CVS 408 using SPS1 and SPS2, noadditional indication of the information may be needed.

As shown in FIG. 4C, the multiple SPSs, e.g., SPS1 and SPS2, may belocated at the beginning of the corresponding CVS, e.g., CVS 408, priorto any of PIC_1 (0), PIC_1 (1) and PIC_2 (0), PIC_2 (1). As shown inFIG. 4D, alternatively, an SPS that indicates picture resolutioninformation for one or more pictures may be located before a first oneof such pictures in a decoding sequence. For example, as shown in FIG.4D, SPS2 is located within CVS 410 prior to a first one of picturesPIC_2 (0), PIC_2 (1), etc., but after a first one of PIC_1 (0), PIC_1(1), etc.

FIG. 5 is a conceptual diagram illustrating the operation of a decodedpicture buffer of a hypothetical reference decoder (HRD) model inaccordance with the techniques of this disclosure. FIG. 5 includes codedpicture buffer (CPB) 502, decoded picture buffer (DPB) 504, and DPBmanagement unit 506. DPB management unit 506 may remove a picture incoded picture buffer (CPB) 502. Video encoder 20 or decoder 30 maydecode the picture, and DPB management unit 506 may store the decodedpicture in decoded picture buffer 504. Based on various criteria, suchas an output time, output flag, or a picture count, DPB management unit506 may remove a picture from DPB 504. In some cases video encoder 20 ordecoder 30 may output the decoded picture. CPB 502 may contain encodedpictures that are removed so that video encoder 20 or decoder 30 mayutilize the decoded pictures that may be needed for inter-prediction asa reference, or for future output. In general, DPB 504 may include amaximum capacity. In previous video coding standards, DPB 504 mayinclude a maximum number of frames that can be stored in the DPB.However, the support adaptive-resolution CVSs, DPB management unit 506may maintain a count of blocks contained within the DPB to measure the“fullness” of the DPB.

This disclosure describes the removal techniques of decoded pictures inthe DPB from at least two perspectives. In the first perspective, DPBmanagement unit 506 of video decoder 30 may remove decoded picturesbased on an output time if the pictures are intended for output. In thesecond perspective, DPB management unit 506 may remove decode picturesbased on the picture order count (POC) values if the pictures areintended for output. In either perspectives, DPB management unit 506 mayremove decoded pictures that are not needed for output (i.e., outputtedalready or not intended for output) when the decoded picture is not inthe reference picture set, and prior to decoding the current picture.Although described with respect to video decoder 30, video encoder 20and DPB management unit 506 of video encoder 20 may also perform any ofthe DPB management techniques described in this disclosure.

DPB 504 may include a plurality of buffers, and each buffer may store adecoded picture that is to be used as a reference picture or is held forfuture output. Initially, the DPB is empty (i.e., the DPB fullness isset to zero). In the described example techniques, the removal of thedecoded pictures from the DPB may occur before the decoding of thecurrent picture, but after video decoder 30 parses the slice header ofthe first slice of the current picture.

In the first perspective, the following techniques may occurinstantaneously at time t_(r)(n) in the following sequence. In thisexample, t_(r)(n) is CPB removal time (i.e., decoding time) of theaccess unit n containing the current picture. As described in thisdisclosure, the techniques occurring instantaneously may mean that thein the HRD model, it is assumed that decoding of a picture isinstantaneous, with a time period for decoding a picture equal to zero.

In the first perspective, decoder 30 may invoke the derivation processfor a reference picture set. If the current picture, which DPBmanagement unit 506 may retrieve from CPB 502 is an IDR picture, DPBmanagement unit 506 may remove all decoded pictures from DPB 504, andmay set and the DPB fullness to 0. If the decoded picture is not an IDRpicture, DPB management unit 506 may remove all pictures not included inthe reference picture set of the current picture from DPB 504. DPBmanagement unit 506 may also remove all pictures having an OutputFlagvalue equal to “0”, or having DPB output time is less than or equal tothe CPB removal time of the current picture, which may be referred to as“n” (i.e., t_(o,dpb)(m)<=t_(r)(n)). The OutputFlag may indicate thatvideo decoder 30 should output the picture (e.g., for display or fortransmission in the case of an encoder).

Whenever DPB management unit 506 removes a picture from DPB 504, DPBmanagement unit 506 may decrement the fullness of DPB 504 by the numberof 8×8 blocks in the picture, i.e.,(pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.

After DPB management unit 506 has removed any pictures from the DPB,video decoder 30 may decode and store the received picture “n” in theDPB. DPB management unit 506 may increment the DPB fullness by thenumber of 8×8 blocks in the stored decoded picture, i.e.,(pic_width_in_luma_samples*pic_height_in_luma_samples)>>6.

Each picture may also have an OutputFlag, as described above. When thepicture has an OutputFlag value equal to 1, the DPB output time, denotedas t_(o,dpb)(n), of the picture may be derived by the followingequation.

t _(o,dpb)(n)=t _(r)(n)+t _(c)*dpb_outputdelay(n)

In the equation, dpb output delay(n) may be the value of dpb outputdelay specified in the picture timing SEI message associated with accessunit “n.”

If the OutputFlag of a picture is equal to “1” andt_(o,dpb)(n)=t_(r)(n), video decoder 30 may output the current picture.Otherwise, if the value of OutputFlag is equal to 0, video decoder 30may not output the current picture. Otherwise, (i.e., if OutputFlag isequal to 1 and t_(o,dpb)(n)>t_(r)(n)), video decoder 30 may output thecurrent picture later, at time t_(o,dpb)(n).

As described above, in some examples, video decoder 30 may crop thepicture in the decoded picture buffer. Video decoder 30 may utilize thecropping rectangle specified in the active sequence parameter set forthe picture to determine the cropping rectangle.

In some examples, video decoder 30 may determine a difference betweenthe DPB output time for a picture and the DPB output time for a picturefollowing the picture in output order. When picture “n” is a picturethat is output and is not the last picture of the bitstream that isoutput, the output time of picture “n” Δt_(o,dpb)(n) may be definedaccording to the following equation.

Δt _(o,dpb)(n)=t _(o,dpb)(n _(n))−t _(o,dpb)(n)

In preceding equation, n_(n) may denote the picture that follows afterpicture “n” in output order and has OutputFlag equal to 1.

In the second perspective for removing decoded pictures, the HRD mayimplement the techniques instantaneously when DPB management unit 506removes an access unit from CPB 502. Again, video decoder 30 and DPBmanagement unit 506 of video decoder 30 may implement the removing ofdecoded pictures from DPB 504, and video decoder 30 may not necessarilyinclude CPB 502. In some examples, video decoder 30 and video encoder 20may not require CPB 502. Rather, CPB 504 is described as part of the HRDmodel for purposes of illustration only.

As above, in the second perspective for removing decoded pictures, DPBmanagement unit 506 may remove the pictures from the DPB before thedecoding of the current picture, but after parsing the slice header ofthe first slice of the current picture. Also, similar to the firstperspective for removing decoded pictures, in the second perspective,video decoder 30 and DPB management unit 506 may perform similarfunctions to those described above with respect to the first perspectivewhen the current picture is an IDR picture.

Otherwise, if the current picture is not an IDR picture, DPB managementunit 506 may empty, without output, buffers of the DPB that store apicture that is marked as “not needed for output” and that storepictures not included in the reference picture set of the currentpicture. DPB management unit 506 may also decrement the DPB fullness bythe number of buffers that DPB management unit 506 emptied. When thereis not empty buffer (i.e., the DPB fullness is equal to the DBP size),DPB management unit 506 may implement a “bumping” process describedbelow. In some examples, when there is no empty buffer, DPB managementunit 506 may implement the bumping process repeatedly unit there is anempty buffer in which video decoder 30 can store the current decodedpicture.

In general, video decoder 30 may implement the following steps toimplement the bumping process. Video decoder 30 may first determine thepicture to be outputted. For example, video decoder 30 may select thepicture having the smaller PicOrderCnt (POC) value of all the picturesin DPB 504 that are marked as “needed for output.” Video decoder 30 maycrop the selected picture using the cropping rectangle specified in theactive sequence parameter set for the picture. Video decoder 30 mayoutput the cropped picture, and may mark the picture as “not needed foroutput.” Video decoder 30 may check the buffer of DPB 504 that storedthe cropped and outputted picture. If the picture is not included in thereference picture set, DPB management unit 506 may empty that buffer andmay decrement the DPB fullness by the number of 8×8 blocks in theremoved picture.

Although the above techniques for the DPB management are described fromthe context of video decoder 30 and DPB management unit 65, in someexamples, video encoder 20, and DPB management unit 93 may implementsimilar techniques. However, video encoder 20 implementing similartechniques is not required in every example. In some examples, videodecoder 30 may implement these techniques, and video encoder 20 may notimplement these techniques.

In this manner, a video coder (e.g., video encoder 20 or video decoder30) may implement techniques to support CVSs having adaptive resolution.Again, the reference picture set may identify the reference picturesthat can potentially be used for inter-predicting the current pictureand can potentially be used for inter-predicting one or more picturefollowing the current picture in decoding order.

In the above examples, the DPB size or fullness may be signaled withrespect to the number of 8×8 blocks of a pictured stored in the DPB.Alternatively, the fullness of the DPB, i.e., the max_dec_pic bufferingsyntax element, may be signaled based on the number of smallest codingunits (SCUs) of a picture. For example, if the smallest SCU among allactive SPSs is 16×16, then the unit of max_dec_pic buffering may be16×16 blocks.

As still another example, video encoder 20 or decoder 30 may signal theDPB size, indicated by the max_dec_pic buffering syntax element, usingunits of frame buffers that are specific to the spatial resolutionindicated by the SPS. For example, if there are two RSSs, rss1 and rss2,with resolution res1 and resolution res2, referring to SPS sps1 and SPSsps2 respectively, wherein res1 is greater than res2, then max_dec_picbuffering in sps1 is counted in frame buffers of res1, and max_dec_picbuffering in sps2 is counted in frame buffers of res2. In this example,video encoder 20 or decoder 30 may be subject to the restriction thatthe DPB size, if counted in units of 8×8 blocks, indicated by themax_dec_pic buffering value in sps1 may not be less than that indicatedby the max_dec_pic buffering value in sps2. Consequently, in the DPBoperations, when video decoder 30 removes one frame buffer of res1 fromDPB 504, the freed buffer space may be sufficient for insertion of adecoded picture of either resolution. However, when decoder 30 removesone frame buffer of res2 from DPB 504, the freed buffer space may not besufficient for insertion of a decoded picture of res1. Rather, videodecoder 30 may remove multiple frame buffers of res2 from DPB 504 inthis case.

The video decoder 30 may derive the reference picture set in any manner,including the example techniques described above. Video decoder 30 maydetermine whether a decoded picture stored in the decoded picture bufferis not needed for output and is not identified in the reference pictureset. When video decoder 30 has outputted the decoded picture and thedecoded picture is not identified in the reference picture set, thevideo decoder 30 may remove the decoded picture from the decoded picturebuffer. Subsequent to removing the decoded picture, video decoder 30 maycode the current picture. For example, video decoder 30 may constructthe reference picture list(s) as described above, and code the currentpicture based on the reference picture list(s).

FIG. 6 is a flowchart illustrating an example operation of using a firstsub-sequence and a second sub-sequence to decode video in accordancewith the techniques of this disclosure. For purposes of illustrationonly, the method of FIG. 6 may be performed by a video codercorresponding to either video encoder 20 or video decoder 30. In themethod of FIG. 6, the video coder may process a coded video sequencecomprising a first sub-sequence and a second sub-sequence (601). Thefirst sub-sequence may include one or more frames each having a firstresolution, and the second sub-sequence may include one or more frameseach having a second resolution. The first sub-sequence may be differentthan the second sub-sequence, and the first resolution may be differentthan the second resolution.

The video coder (e.g., video encoder 20 or video decoder 30) may alsoprocess a first sequence parameter set (SPS) and a second sequenceparameter set for the coded video sequence (602). The first sequenceparameter set may indicate the first resolution of the one or moreframes of the first sub-sequence, and the second sequence parameter setmay indicate the second resolution of the one or more frames of thesecond sub-sequence. The first sequence parameter set may also bedifferent than the second sequence parameter set. The video coder (e.g.,video encoder 20 or video decoder 30) may use the first sequenceparameter set and the second sequence parameter set to code the codedvideo sequence (603).

In some examples, the video coder may comprise an encoder, e.g., encoder20 of FIGS. 1-2. In the case where the video coder comprises a decoder,processing SPSs and sub-sequences may comprise receiving the SPSs andsub-sequences. In this case, coding the first and second video sequencesmay comprise decoding the first and second video sequences.

In the case where the video coder comprises an encoder, processing SPSsand sub-sequences may comprise generating the SPSs and sub-sequences. Inthis case, coding the first and second video sequences may compriseencoding the first and second video sequences. Additionally in the casewhere the video coder comprises an encoder, the video encoder maytransmit the coded video sequence comprising the first sub-sequence andthe second subs-sequence instead of receiving the video sequencecomprising the first and second sub-sequence. In some examples, thefirst resolution and the second resolution may each comprise a spatialresolution.

In some examples, the video coder may code the first sequence parameterset and the second sequence parameter in a received bitstream prior toeither the first sub-sequence or the second sub-sequence.

In another example, to receive the first sequence parameter set and thesecond sequence parameter set of the coded video sequence, the videocoder may be configured to receive both the first sequence parameter setand the second sequence parameter set prior to receiving either of thefirst sub-sequence and the second sub-sequence.

In another example, the video coder may code the first sequenceparameter set in a received bitstream prior to the first sub-sequenceand the second sequence parameter set is coded in the received bitstreamafter at least one frame of the one or more frames of the firstsub-sequence, and prior to the second sub-sequence.

In another example, to receive the first sequence parameter set and thesecond sequence parameter set of the coded video sequence, the videocoder may be configured to receive the second sequence parameter setafter receiving at least one frame of the one or more frames of thefirst sub-sequence, and prior to receiving the second sub-sequence.

In yet another example, the video coder may interleave the one or moreframes of the first sub-sequence and the one or more frames of thesecond sub-sequence in the coded video sequence.

FIG. 7 is a flowchart illustrating an example operation of managing adecoded picture buffer. For purposes of illustration only, the method ofFIG. 7 may be performed by a video coder corresponding to either videoencoder 20 or video decoder 30. In the method of FIG. 7, a video codermay receive a coded video sequence comprising a first sub-sequence and asecond sub-sequence (701). The first sub-sequence may include one ormore frames each having a first resolution, and the second sub-sequencemay include one or more frames each having a second resolution. Thefirst sub-sequence may be different than the second sub-sequence, andthe first resolution may be different than the second resolution. Thevideo coder may receive a first decoded frame of video data, and thefirst decoded frame may be associated with a first resolution. In someexamples, wherein the resolution may comprise a spatial resolution.

In accordance with the method illustrated in FIG. 7, the video coder mayalso determine whether a decoded picture buffer is available to storethe first decoded frame based on the first resolution (702). In theevent the decoded picture buffer is available to store the first decodedframe, the video coder may store the first decoded frame in the decodedpicture buffer, and determine whether the decoded picture buffer isavailable to store a second decoded frame of video data. The seconddecoded frame of video data may be associated with a second resolution.The video coder may also determine whether the decoded picture buffer isavailable to store the second decoded frame based on the firstresolution and the second resolution (704). The first decoded frame mayalso be different than the second decoded frame.

In some examples, to determine whether the decoded picture buffer isavailable to store the first decoded frame based on the firstresolution, the video coder may be configured to determine an amount ofinformation that may be stored within the decoded picture buffer,determine an amount of information associated with the first decodedframe based on the first resolution, and compare the amount ofinformation that may be stored within the decoded picture buffer and theamount of information associated with the first decoded frame.

In an example, to determine whether the decoded picture buffer isavailable to store the second decoded frame based on the firstresolution and the second resolution, the video coder may be configuredto determine an amount of information that may be stored within thedecoded picture buffer based on the first resolution, determine anamount of information associated with the second decoded frame based onthe second resolution, and compare the amount of information that may bestored within the decoded picture buffer and the amount of informationassociated with the second decoded frame.

In some examples, the video coder may be further configured to removethe first decoded frame from the decoded picture buffer. The video codermay also be an encoder, e.g., encoder 20 of FIGS. 1-2, or a decoder,e.g., decoder 30 of FIGS. 1-2, in some examples.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transient media, but areinstead directed to non-transient, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and Blu-ray disc, wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore digital signal processors (DSPs), general purpose microprocessors,application specific integrated circuits (ASICs), field programmablelogic arrays (FPGAs), or other equivalent integrated or discrete logiccircuitry. Accordingly, the term “processor,” as used herein may referto any of the foregoing structure or any other structure suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated hardware and/or software modules configured for encoding anddecoding, or incorporated in a combined codec. Also, the techniquescould be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (e.g., a chip set). Various components,modules, or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples arewithin the scope of the following claims.

What is claimed is:
 1. A method of decoding video data, the methodcomprising: receiving a coded video sequence comprising a firstsub-sequence and a second sub-sequence, wherein the first sub-sequenceincludes one or more frames each having a first resolution, and thesecond sub-sequence includes one or more frames each having a secondresolution, and wherein the first sub-sequence is different than thesecond sub-sequence, and the first resolution is different than thesecond resolution; receiving a first sequence parameter set and a secondsequence parameter set for the coded video sequence, wherein the firstsequence parameter set indicates the first resolution of the one or moreframes of the first sub-sequence, and the second sequence parameter setindicates the second resolution of the one or more frames of the secondsub-sequence, and wherein the first sequence parameter set is differentthan the second sequence parameter set; and using the first sequenceparameter set and the second sequence parameter set to decode the codedvideo sequence.
 2. The method of claim 1, wherein the first sequenceparameter set and the second sequence parameter set are coded in areceived bitstream prior to either the first sub-sequence or the secondsub-sequence.
 3. The method of claim 1, wherein receiving the firstsequence parameter set and the second sequence parameter set of thecoded video sequence comprises: receiving both the first sequenceparameter set and the second sequence parameter set prior to receivingeither of the first sub-sequence and the second sub-sequence.
 4. Themethod of claim 1, wherein the first sequence parameter set is coded ina received bitstream prior to the first sub-sequence and the secondsequence parameter set is coded in the received bitstream after at leastone frame of the one or more frames of the first sub-sequence, and priorto the second sub-sequence.
 5. The method of claim 1, wherein receivingthe first sequence parameter set and the second sequence parameter setof the coded video sequence comprises: receiving the second sequenceparameter set after receiving at least one frame of the one or moreframes of the first sub-sequence, and prior to receiving the secondsub-sequence.
 6. The method of claim 1, wherein the one or more framesof the first sub-sequence and the one or more frames of the secondsub-sequence are interleaved in the coded video sequence.
 7. The methodof claim 1, wherein the first resolution and the second resolution eachcomprise a spatial resolution.
 8. An apparatus for decoding video data,the apparatus comprising a video decoder configured to: receive a codedvideo sequence comprising a first sub-sequence and a secondsub-sequence, wherein the first sub-sequence includes one or more frameseach having a first resolution, and the second sub-sequence includes oneor more frames each having a second resolution, and wherein the firstsub-sequence is different than the second sub-sequence, and the firstresolution is different than the second resolution; receive a firstsequence parameter set and a second sequence parameter set for the codedvideo sequence, wherein the first sequence parameter set indicates thefirst resolution of the one or more frames of the first sub-sequence,and the second sequence parameter set indicates the second resolution ofthe one or more frames of the second sub-sequence, and wherein the firstsequence parameter set is different than the second sequence parameterset; and use the first sequence parameter set and the second sequenceparameter set to decode the coded video sequence.
 9. The apparatus ofclaim 8, wherein the first sequence parameter set and the secondsequence parameter set are coded in a received bitstream prior to eitherthe first sub-sequence or the second sub-sequence.
 10. The apparatus ofclaim 8, wherein to receive the first sequence parameter set and thesecond sequence parameter set of the coded video sequence, the apparatusis configured to: receive both the first sequence parameter set and thesecond sequence parameter set prior to receiving either of the firstsub-sequence and the second sub-sequence.
 11. The apparatus of claim 8,wherein the first sequence parameter set is coded in a receivedbitstream prior to the first sub-sequence and the second sequenceparameter set is coded in the received bitstream after at least oneframe of the one or more frames of the first sub-sequence, and prior tothe second sub-sequence.
 12. The apparatus of claim 8, wherein toreceive the first sequence parameter set and the second sequenceparameter set of the coded video sequence, the apparatus is configuredto: receive the second sequence parameter set after receiving at leastone frame of the one or more frames of the first sub-sequence, and priorto receiving the second sub-sequence.
 13. The apparatus of claim 8,wherein the one or more frames of the first sub-sequence and the one ormore frames of the second sub-sequence are interleaved in the codedvideo sequence.
 14. The apparatus of claim 8, wherein the firstresolution and the second resolution each comprise a spatial resolution.15. An apparatus for decoding video data, the apparatus comprising:means for receiving a coded video sequence comprising a firstsub-sequence and a second sub-sequence, wherein the first sub-sequenceincludes one or more frames each having a first resolution, and thesecond sub-sequence includes one or more frames each having a secondresolution, and wherein the first sub-sequence is different than thesecond sub-sequence, and the first resolution is different than thesecond resolution; means for receiving a first sequence parameter setand a second sequence parameter set for the coded video sequence,wherein the first sequence parameter set indicates the first resolutionof the one or more frames of the first sub-sequence, and the secondsequence parameter set indicates the second resolution of the one ormore frames of the second sub-sequence, and wherein the first sequenceparameter set is different than the second sequence parameter set; andmeans for using the first sequence parameter set and the second sequenceparameter set to decode the coded video sequence.
 16. Acomputer-readable storage medium comprising instructions that, whenexecuted, cause at least one processor to decode video data, wherein theinstructions cause the at least one processor to: receive a coded videosequence comprising a first sub-sequence and a second sub-sequence,wherein the first sub-sequence includes one or more frames each having afirst resolution, and the second sub-sequence includes one or moreframes each having a second resolution, and wherein the firstsub-sequence is different than the second sub-sequence, and the firstresolution is different than the second resolution; receive a firstsequence parameter set and a second sequence parameter set for the codedvideo sequence, wherein the first sequence parameter set indicates thefirst resolution of the one or more frames of the first sub-sequence,and the second sequence parameter set indicates the second resolution ofthe one or more frames of the second sub-sequence, and wherein the firstsequence parameter set is different than the second sequence parameterset; and use the first sequence parameter set and the second sequenceparameter set to decode the coded video sequence.
 17. A method ofencoding video data, the method comprising: generating a coded videosequence comprising a first sub-sequence and a second sub-sequence,wherein the first sub-sequence includes one or more frames each having afirst resolution, and the second sub-sequence includes one or moreframes each having a second resolution, and wherein the firstsub-sequence is different than the second sub-sequence, and the firstresolution is different than the second resolution; generating a firstsequence parameter set and a second sequence parameter set for the videosequence, wherein the first sequence parameter set indicates the firstresolution of the one or more frames of the first sub-sequence, and thesecond sequence parameter set indicates the second resolution of the oneor more frames of the second sub-sequence, and wherein the firstsequence parameter set is different than the second sequence parameterset; and transmitting the coded video sequence comprising the firstsub-sequence and the second sub-sequence, and the first sequenceparameter set and the second sequence parameter.
 18. The method of claim17, wherein the first sequence parameter set and the second sequenceparameter set are coded in a transmitted bitstream prior to either thefirst sub-sequence or the second sub-sequence.
 19. The method of claim17, wherein transmitting the first sequence parameter set and the secondsequence parameter set of the coded video sequence comprises:transmitting both the first sequence parameter set and the secondsequence parameter set prior to transmitting either of the firstsub-sequence and the second sub-sequence.
 20. The method of claim 17,wherein the first sequence parameter set is coded in a transmittedbitstream prior to the first sub-sequence and the second sequenceparameter set is coded in the transmitted bitstream after at least oneframe of the one or more frames of the first sub-sequence and prior tothe second sub-sequence.
 21. The method of claim 17, whereintransmitting the first sequence parameter set and the second sequenceparameter set of the coded video sequence comprises: transmitting thesecond sequence parameter set after transmitting at least one frame ofthe one or more frames of the first sub-sequence, and prior totransmitting the second sub-sequence.
 22. The method of claim 17,wherein the one or more frames of the first sub-sequence and the one ormore frames of the second sub-sequence are interleaved in the codedvideo sequence.
 23. The method of claim 17, wherein the first resolutionand the second resolution each comprise a spatial resolution.
 24. Anapparatus for coding video data, the apparatus comprising a video coderconfigured to: generate a coded video sequence comprising a firstsub-sequence and a second sub-sequence, wherein the first sub-sequenceincludes one or more frames each having a first resolution, and thesecond sub-sequence includes one or more frames each having a secondresolution, and wherein the first sub-sequence is different than thesecond sub-sequence, and the first resolution is different than thesecond resolution; generate a first sequence parameter set and a secondsequence parameter set for the video sequence, wherein the firstsequence parameter set indicates the first resolution of the one or moreframes of the first sub-sequence, and the second sequence parameter setindicates the second resolution of the one or more frames of the secondsub-sequence, and wherein the first sequence parameter set is differentthan the second sequence parameter set; and transmit the coded videosequence comprising the first sub-sequence and the second sub-sequence,and the first sequence parameter set and the second sequence parameter.25. The apparatus of claim 24, wherein the first sequence parameter setand the second sequence parameter set are coded in a transmittedbitstream prior to either the first sub-sequence or the secondsub-sequence.
 26. The apparatus of claim 24, wherein to transmit thefirst sequence parameter set and the second sequence parameter set ofthe coded video sequence, the apparatus is configured to: transmit boththe first sequence parameter set and the second sequence parameter setprior to transmitting either of the first sub-sequence and the secondsub-sequence.
 27. The apparatus of claim 24, wherein the first sequenceparameter set is coded in a transmitted bitstream prior to the firstsub-sequence and the second sequence parameter set is coded in thetransmitted bitstream after at least one frame of the one or more framesof the first sub-sequence and prior to the second sub-sequence.
 28. Theapparatus of claim 24, wherein to transmit the first sequence parameterset and the second sequence parameter set of the coded video sequence,the apparatus is configured to: transmit the second sequence parameterset after transmitting at least one frame of the one or more frames ofthe first sub-sequence, and prior to transmitting the secondsub-sequence.
 29. The apparatus of claim 24, wherein the one or moreframes of the first sub-sequence and the one or more frames of thesecond sub-sequence are interleaved in the coded video sequence.
 30. Theapparatus of claim 24, wherein first resolution and the secondresolution each comprise a spatial resolution.
 31. An apparatus forencoding video data, the apparatus comprising: means for generating acoded video sequence comprising a first sub-sequence and a secondsub-sequence, wherein the first sub-sequence includes one or more frameseach having a first resolution, and the second sub-sequence includes oneor more frames each having a second resolution, and wherein the firstsub-sequence is different than the second sub-sequence, and the firstresolution is different than the second resolution; means for generatinga first sequence parameter set and a second sequence parameter set forthe video sequence, wherein the first sequence parameter set indicatesthe first resolution of the one or more frames of the firstsub-sequence, and the second sequence parameter set indicates the secondresolution of the one or more frames of the second sub-sequence, andwherein the first sequence parameter set is different than the secondsequence parameter set; and means for transmitting the coded videosequence comprising the first sub-sequence and the second sub-sequence,and the first sequence parameter set and the second sequence parameter.32. A computer readable storage medium comprising instructions that,when executed, cause at least one processor of a video encoding deviceto: generate a coded video sequence comprising a first sub-sequence anda second sub-sequence, wherein the first sub-sequence includes one ormore frames each having a first resolution, and the second sub-sequenceincludes one or more frames each having a second resolution, and whereinthe first sub-sequence is different than the second sub-sequence, andthe first resolution is different than the second resolution; generate afirst sequence parameter set and a second sequence parameter set for thevideo sequence, wherein the first sequence parameter set indicates thefirst resolution of the one or more frames of the first sub-sequence,and the second sequence parameter set indicates the second resolution ofthe one or more frames of the second sub-sequence, and wherein the firstsequence parameter set is different than the second sequence parameterset; and transmit the coded video sequence comprising the firstsub-sequence and the second sub-sequence, and the first sequenceparameter set and the second sequence parameter.
 33. A computer readablestorage medium, comprising a data structure stored thereon, the datastructure comprising: a coded video sequence comprising a firstsub-sequence and a second sub-sequence, wherein the first sub-sequenceincludes one or more frames each having a first resolution, and thesecond sub-sequence includes one or more frames each having a secondresolution, and wherein the first sub-sequence is different than thesecond sub-sequence, and the first resolution is different than thesecond resolution; and a first sequence parameter set and a secondsequence parameter set for the coded video sequence, wherein the firstsequence parameter set indicates the first resolution of the one or moreframes of the first sub-sequence, and the second sequence parameter setindicates the second resolution of the one or more frames of the secondsub-sequence, and wherein the first sequence parameter set is differentthan the second sequence parameter set.
 34. The computer readable mediumof claim 33, wherein the first sequence parameter set and the secondsequence parameter set are coded in a bitstream on the data structureprior to either the first sub-sequence or the second sub-sequence. 35.The computer readable medium of claim 33, wherein the first sequenceparameter set is coded in a bitstream on the data structure prior to thefirst sub-sequence and the second sequence parameter set is coded in thebitstream on the data structure after at least one frame of the one ormore frames of the first sub-sequence, and prior to the secondsub-sequence.